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INTRODUCTION  i  ■ 

Bayesian  inferences  or  Bayesian  Statistics  involve  an  approach  to 
Statistical  inference  based  on  the  theory  of  subjective  probability.   The 
term  'Bayesian'  arises  from  an  eleraentry  theorem  of  probability  theory, 
named  after  the  Rev.  Thomas  Bayes,  who  first  enunciated  it  and  proposed  its 
use  in  inference.  Since  1950,  many  statisticians  have  taken  an  active 
interest  in  this  subject.  Hence  the  term  "neo-Bayesian"  is  sometimes  used 
instead  of  'Bayesian'. 

BAYESIAN  INFERENCES 

Bayesian  inference  involves  £  priori  and  £  posteriori  probability 
distribution.  A  distribution  which  is  assessed  prior  to  sample  evidence  is 
known  as  a  priori  distribution.  The  term  'posterior'  means  after  the  sample 

evidence. 

Suppose  before  an  experiment  begins,  it  can  be  assumed  that  p^^  is  the 
probability  that  F  is  the  true  distribution  of  X.  If  an  experiment  consists 
of  observations  on  Xj^,  .  .  .  ,  X^,  the  a  posteriori  probability  that  F  is 

equal  to  F.  can  be  computed  after  the  sample  x  -  (Xj^ x^) ,  has  been 

drawn  if  the  &_  priori  probability  distribution  is  known  or  can  be  assumed. 
The  a  posteriori  probability,  denoted  by  p^^^,  is  the  conditional  probability 
that  F.  given  the  observed  values  x^^,  .  .  .  ,  x^,.   If  P  is  discrete,  the 
^  posteriori  probability  function  is  given  by 

^"^       Zp,(xlF  ) 
0 


and  for  the  continuous  case 

^^^     Pix  "  I   p^f(x|F^)dp 
Q 

For  any  element  F  of  U,  f(x|F)  in  the  above  equations,  denotes  the  prob- 
ability density  function  of  X.  The  expression  for  p^^  is  also  known  as 
Bayes'  formula. 

Consider  the  problem  of  making  inferences  about  a  Bernoulli  process 
With  parameter  'p*.  Suppose  that  no  direct  sample  evidence  from  the  process 
has  been  obtained.  Based  on  experience  with  similar  processes,  general 
knowledge  etc;  one  may  be  willing  to  translate  judgments  about  the  process 
into  probabilistic  terms.  As  such  the  probability  distribution  for  p  ('*' 
indicates  that  parameter  p  is  considered  a  random  variable)  may  be  considered 
to  be  "subjective".  Suppose  the  a  priori  distribution  of  p  is  uniform  in  the 
interval  (0,  1).  The  probability  that  p  lies  in  a  subinterval  is  that  sub- 
intervals  length,  no  matter  where  the  subinterval  is  located  between  0  and  1. 
The  probability  of  observing  a  sample  such  as  head,  head,  and  tail  on  three 
tosses  of  a  coin,  given  that  the  probability  of  observing  a  head  is  p,  is 
p^(l-p).  This  function  is  known  as  a  likelihood  function.  Through  use  of 
Bayes'  theorem  one  can  obtain  the  a  posteriori  distribution  of  p  using  the 

A 

likelihood  function  and  the  a_  priori  distribution  of  p.  In  terms  of  in- 
ferences  about  p,  Bayes'  theorem  is  written  as 


.n-r 


(3)     f(;|r,n)  -  '^l^   P  ^^-P> 

q!   f(p)  P   (I-P)     dp 


where  f  (p)  =  a^  priori  density  of  p. 

p'^(l-p)"~'^  =  likelihood  if  r  heads  are  observed  in  n  trials. 
f(p|r,n)   "  a  posteriori  density  of  p  given  the  sample  evidence. 
The  integral  in  the  denominator  can  be  regarded  as  a  normalizing  factor  so 

A 

that  f(p|r,n)  will  be  a  density  function.   It  is  also  the  probability,  in  the 
light  of  the  a  priori  distribution,  of  obtaining  the  sample  actually  observed. 
In  the  above  example  ••'' 

f(p)  -  1;  (0  IP  ^  1), 

r  "  2,  n  ■  3,  and 

^  A    A„         *   T^    *•       ^  1  ^0  A  A 

^r   f(p)P  (I-P)    dp  -q/V(1-p)   dp  =  1/12 

so  f(p|r  =  2,  n  »  3)  =  12p^(l-p)       0  ^  p  ^  1 

=  0  elsewhere 

The  best  Bayesian  point  estimate  can  be  shovm  to  be  the  mean  of  the 
a  posteriori  distribution.  In  this  given  example,  this  would  be 

Q/h2p.p^(l-p)  dp  =  3/5  =  .6 
It  may  be  noticed  that  the  _a  posteriori  probability  that  the  coin  is 
"biased"  in  favor  of  heads  is  2/3. 

THE  LIKELIHOOD  PRINCIPLE 

The  only  input  needed  for  a  Bayesian  analysis  are  the  likelihood  function 
and  the  a^  priori  distribution.  Thus  the  import  of  the  sample  evidence  is 
fully  reflected  in  the  likelihood  function;  a  principle  that  is  known  as  the 
likelihood  principle.  If  one  wants  to  perform  his  oxm  Bayesian  analysis,  he 


^f'^n 


needs  the  likelihood  function.  He  need  not  be  content  with  the  distribution 
based  on  someone  else's  a  priori,  nor  traditional  analysis  such  as  a  signif- 
icance tests,   from  which  it  may  be  difficult  or  impossible  to  recover  the 
likelihood  function. 

PROBABILISTIC  PREDICTION 

The  idea  of  calculating  the  probability  of  a  sample  in  the  light  of 
different  prior  distributions  has  important  consequences.   For  example,  the 
denominator  in  the  right  hand  side  of  the  Bayes'  formula  (3)  for  Bernoulli 
sampling  can  be  interpreted  as  the  probability  of  obtaining  the  particular 
sample  actually  observed,  given  the  a^  priori  distribution  of  p.  While  a 
person's  subjective  probability  distribution  of  p  cannot  be  said  to  be  "right" 
or  "vrrong",  there  are  better  and  worse  subjective  distributions,  and  the 
decision  criterion  might  be  predictive  accuracy.   Thus  if  A  and  B  each  has  a 
distribution  for  p  and  a  new  sample  is  then  observed,  one  can  calculate  the 
probability  of  the  sample  in  the  light  of  each  _a  priori  distribution.  The 
ratio  of  these  probabilities,  technically  a  marginal  likelihood  ratio, 
measures  the  extent  to  which  the  data  favors  A  over  B  or  vice-versa. 

MULTIVARIATE  INFERENCE  AND  NUISANCE  PARAMETERS 

Consider  inferences  about  the  mean  y  of  a  normal  distribution  with 
unknown  variance  o^.  In  this  case,  begin  with  a  joint  prior  distribution 

A  A 

for  ]i   and  o^.   The  likelihood  function  is  now  a  function  of  two  variables  \i 

A 

and  o^.   If  interest  centers  only  on  y,  then  o^  is  said  to  be  a  nuisance 
parameter.  In  principle,  it  is  easy  to  deal  with  a  nuisance  parameter. 
Simply  integrate  it  out  of  the  _a  posteriori  distribution.   This  means  that 


one  must  find  the  marginal  distribution  of  y  from  the  joint  a  posteriori 
distribution  of  y  and  o^.  Multivariate  problems  and  nuisance  parameters 
can  be  dealt  with  by  such  an  approach, 

DESIGN  OF  EXPERIMENTS  AND  SURVEYS 

In  the  above  discussion,  attention  centered  on  the  analysis  of  samples, 
without  concern  about  the  kind  and  magnitude  of  sample  evidence,  that  should 
be  obtained.   This  problem  is  called  the  design  problem.  The  Bayesian 
solution  of  a  design  problem  requires  that  one  looks  beyond  the  _a  priori 
distribution  to  the  ultimate  decisions  that  will  be  made  in  the  light  of  this 
distribution.  The  question  of  best  design  depends  on  the  purposes  to  be  served 
by  collecting  the  data.  Given  the  specific  purpose  and  the  principle  of  max- 
imization of  expected  utility,  it  is  possible  to  calculate  the  expected  util- 
ity of  the  best  act  for  any  particular  sample  outcome.  This  experiment  is 
repeated  for  each  possible  sample  outcome  for  a  given  sample  design.  Next, 
one  can  weigh  all  these  utilities  by  the  probability  of  that  outcome  in  the 
light  of  the  _a  priori  distribution.  This  gives  an  overall  expected  utility 
for  any  proposed  design.  Finally,  one  picks  the  sample  design  with  the  highest 
expected  utility.  Take  the  case  of  two-action  problems,  for  example,  deciding 
whether  a  new  medical  treatment  is  better  or  worse  than  a  standard  treatment. 
This  procedure  is  in  no  conflict  with  the  traditional  approach  of  selecting 
designs  by  comparing  operating  characteristics,  although  it  formalizes  certain 
things  -  prior  probabilities  and  utilities  -  that  often  are  treated  intuitively 
in  the  traditional  approach. 


DERIVATION  OF  THE  t-TEST  VIA  BAYES'  THEOREM 

As  has  been  noted,  according  to  the  Bayesian  argument  there  exists 

a  priori  distributions  for  the  mean  y  and  variance  a^.     Assume  that  the  local 

a  priori  distribution  of  the  parameter  y  and  o^  are  independent.  Also  assume 

that  the  a  priori  distribution  of  y  is  locally  uniform.  Now  the  Savage  (1960) 

principle  of  precise  measurement  says 

.  .  .  that  we  do  not  need  to  know  exactly  what  the  a,  priori  distri- 
bution of  y  is  if  we  can  say  only  that  in  the  region  in  which  the 
likelihood  is  appreciable  it  does  not  change  very  much,  and  at  no  other 
point  is  it  of  sufficiently  great  magnitude  as  to  become  appreciable 
when  multiplied  by  the  likelihood.  This  principle  would  be  applicable 
in  situations  where  the  likelihood  dominates  but  is  not  applicable  in 
situations  where  the  a  priori  probability  density  dominates. 

The  importance  of  this  principle  lies  in  the  fact  that  in  actual  practice 
most  of  the  experiments  are  conducted  only  when  it  is  expected  that  the 
likelihood  will  exert  a  much  stronger  influence  in  the  final  result  than  the 
a  priori  distribution.  Otherwise,  there  is  little  point  in  doing  the  exper- 
iment. For  example,  suppose  that  the  value  of  the  gravitational  constant  in 
suitable  units  had  been  estimated  as  32.2  +  0.1  then  there  would  be  little 
justification  for  making  further  measurements  with  a  method  whose  accuracy 
was,  say,  +  0.2,  but  considerable  justification  for  conducting  further  exper- 
iments using  a  method  whose  accuracy  was  +  0.02, 

The  argument  that  if  y  is  taken  as  locally  uniform,  then  log  y,  -  etc; 
will  not  be,  loses  its  force  if  it  is  remembered  that  unless  the  range  of 
values  of  y  over  which  the  likelihood  is  appreciable  is  large  compared  with 
the  average  magnitude  of  y  over  the  same  range,  then  such  transformations 
will  make  little  practical  difference  in  the  range  considered.  In  the  example 
considered  above,  for  instance,  if  the  a  priori  distribution  of  y  were  assumed 
uniform  from,  say,  y  -  32.0  to  y  -  32. A,  then  to  a  close  approximation,  the 


a  priori  distribution  of,  for  example,  log  \i   and  —  would  be  approximately 
uniform  over  corresponding  ranges,  '.  - 

Assume"  also  that  either  o  or  its  logarithm  or  some  power  of  a   has  a 
distribution  which  is  locally  uniform.  Then 

(4)  p,  (vi)w"^k,  ^-MCO   °_.      if  distribution  of  o  assumed  uniform 

/  o"      if  distribution  of  log  o  assumed  uniform 

where  k  is  a  constant  and  '^"  means  "proportional  to". 

Let  i(vi,a|Y)  denote  the  likelihood  function  given  the  sample  Y, 

then  the  a  posteriori  distribution  for  p  and  a  would  be 

(5)  p(w,a|Y)  -  k  Ji(li,olY)p^(p)  p^Co) 

where         k"'^  -  //l(ji,o|Y).  p  (y)  .  p  (o)  dydo 

R 

(6)  Now    p(ii,0|Y)  -  p(y|a,y).  p(o|s) 

1 


where     p(u|o»y)  =  jn/ (lita^)  (■        exp  i-C-j  n/o^)  (y-y)  ^ 

and      p(a|s)  -  2  jr[|(v-q)|  j  "^  (|  .^sV^^'^^K'^- ^'^'^^     exp  |-  |  vs^/o^j. 

(v  "   n-1,  and  q<v) 
On  integrating  out  a   one  obtains 

P(^  1^)  "   P  f  v-q]        (^°^  ^  ^^^°  ^^^^^^  ^ 
where  p  It  __  1  is  the  t-distribution  with  v-q  degrees  of  freedom,     ^ 
In  particular,  if  log  a   is  assumed  to  be  locally  uniform,  then  the    ^ 
a  posteriori  distribution  of  p  is  a  t-distribution  with  v  =»  n-1  d.f.   If  a   is 
assumed  locally  uniformly  distributed,  then  the  a  posteriori  distribution  will 


be  a  t-distribution  with  (n-2)  d.f.  and  if  o^  is  locally  uniform  then  one 
obtains  the  t-distribution  with  (n-3)  degrees  of  freedom. 

SELECTION  OF  THE  PARENT  DISTRIBUTION 

Assume  that  the  parent  distribution  is  a  member  of  a  class  of  symmetric 
distributions  which  includes,  in  particular,  the  normal,  together  with  other 
distributions  on  the  one  hand  more  leptokurtic,  and  on  the  other  hand  more 
platykurtic  than  the  normal,  A  convenient  choice  is  the  class  of  power 
distributions  employed  by  Diananda  (19A9),  Box  (1953),  and  Turner  (1960),  • 
where 

2/(1+6) 


(7)    p(y|y,a,e)  -  w  exp 


2   I  0  ' 


w  =  r  [1  +y  (i+e)  2 


(-  00   <y   <  00^   0   <  0<~,   -  ">   <  li<  <»,   -  1   <  8   <    1) 

where  6  denotes  a  non-normality  parameter.  In  particular,  when  6=0,  one  has 
the  normal  distribution;  when  6  is  1,  it  turns  out  to  be  the  double  exponen- 
tial; and  when  6  — >  -1,  the  distribution  tends  to  the  uniform  distribution. 

I'Jhen  two  samples  are  dra\<7n  from  possibly  different  members  of  this  class, 
the  joint  probability  density  will  depend  upon  six  unkno\m  parameters  i.e;  a 
set  (6,  ,vi,  ,0  )  associated  with  the  first  sample  and  a  set  (&   fXi^fO  )   with  the 
others.  It  will  be  assumed  throughout  the  remaining  discussion  that  the 

"I 

parents  have  the  same  parameter  B  ,  and  the  ratio  —  of  the  scale  parameters 
is  the  variance  ratio. 


■  TTEt*!^'' 


DERIVATION  OF  THE  POSTERIOR  DISTRIBUTION  OF  y 
FOR  A  SPECIFIC  SYI2ETRIC  PARENT 

Suppose  one  selects  a  parent  distribution  as  given  above  with  S  assumed 
to  have  a  fixed  value  g  .  By  doing  so,  he  will  adopt  the  same  assumptions 


_a  priori  as  are  necessary 
be  zero.  One  has. 


to  derive  the  t-distribution  when  &  is  assumed  to 


(8)     ji(u,o|Y.3^)  - 


r  {l+  -^(1  4-  3^)}  2 


{l  +  |(1  -*-  3,)] 


-n 


exp 


Ul^ 


2/(1+8^) 


,  p^CyXk  ,  P2(o)c?<a 


-1 


So  that 


(9)     p(y,a|Y,3j  -  ko 


-(n+1) 


exp  /  -  2"  I 


y^- w 


2/(1+3  ) 


assuming  at  least  two  of  the  observations  are  not  equal,  where 

2/(1+3) 


-1    rr   -(n+1) 
k   =  //o  ^       exp 


'-I  11^ 


R  I      i 


dydo 


By  Integrating  out  o,  one  obtains  for  the  a  posteriori  distribution  of  y  for 
any  fixed  3  =  0  in  the  permissible  range  the  simple  expression 


(10)     p(y|Y,3^)  -  k  [M(y)J 


where 


M(y)  -  I  ly^-vil 


2/(1+  3^) 


10 


and  M(y)/n  is  the  absolute  moment  of  order  2/(1+3^)  of  the 
observations  about  y  .  The  integral 


-  }  n  (1+e^) 


jj-1  =  /  fhCy)  1  dy  is  merely  a  normalizing 

factor  which  ensures  that  the  total  area  under  the  distribution  is  unity. 
Usually  it  is  difficult  to  express  it  as  a  simple  function  but  it  can  be 
computed  easily  by  use  of  computers.  >.■ 

Since  p(y|Y,3  )  is  a  mono tonic  function  of  M(y) ,  then 
(i)  p(w|y,3  )  is  continuous,  differentiable  and  unimodal,  although  not 
necessarily  symmetric,  the  mode  being  attained  in  the  interval 
fv   V  1  where  y  and  y^  are  respectively  the  largest  and  the  smallest  of 
the  observations. 

(ii)  h-hen   8^  =  0,  M(y)  -  ICy^-y)^  =  (n-l)s^  +  nC^-y)^  and  making 
the  necessary  substitution  in  (10).  one  obtains  for  the  a  posteriori 
distribution  of  y 


P  ^^  1^'^^  "   P^Vl^  ""'  obtained 


earlier. 


1  (3,-1) 
(iii)   VThen  e  — >-l,  lim   [M(y)  ]         -  (h  +  |m  -  y])   and 

e^— >-l  .  . 

making  the  necessary  substitutions 

(11)     lim    p(y|Y,3^)  -  k[h  +  |m  -  y|] 

3  — >-  1 
o 


-n 


1 


where  ^  '  ^     iVj^  '  V^^  and  m  =  7  [y^^  +  y^] 

k"^  -  r  (h  +  |m  -yl)""  dy 


11 


so  that  lim  p/fek-  l^.e^^  -  P(F2,   2(n-l)) 


^o  -^  -' 


Thus  notice  that  when  the  parent  is  normal  (B^  =  0)  the  expression  (10) 
yields  the  t-distribution,  and  when  the  parent  distribution  approaches  the 

uniform  (B  — >  -1),  the  expression  (10)  gives  the  double  F-distribution 

o 
with  2  and  2(n-l)  d.f.  In  each  of  these  cases,  the  a  posteriori  distri- 
bution can  be  expressed  in  terms  of  simple  functions  of  the  observations 
which  provide  the  minimal  sufficient  statistics  for  y  and  a.   (Box  and 

Tiao  (1962)) 

(iv)     In  certain  other  cases,  it  is  possible  to  express  the  a  posteriori 
distribution  of  y  in  terms  of  a  fixed  number  of  functions  of  the  obser- 
vations. For  instance,  when 

6  -  (l-q)/q  (q  -  1,  2,  3,  .  .  .),  one  has 

(12)     p(..olY,B^)<o-<"^^>   exp|-  |  a'^^   J,  (-l)'^  C^.j^^'^q-r  j 


and 


(13)     p(wlY.B^)C<r  I^  (-1)''  (2qj/s2q.^  J 


-n/2q 


where  S  =  ^  yT  (Box  and  Tiao  (1962)) 
r   ^  1 

and  it  is  seen  that  the  set  of  2q  functions,  S^,  S^,  .  .  .  S^^ 
of  the  observations  are  jointly  sufficient  for  y  and  a. 


12 

In  general,  however,  the  a  posteriori  distribution  cannot  be  expressed 
in  terms  of  a  few  functions  of  the  observations  and  the  minimal  sufficient 
statistics  are  the  observations  themselves, 

CHOICE  OF  PRIOR  DISTRIBUTIONS  FOR  \i^,      ^2*      '^i*      ^2  ^^   ^ 

As  mentioned  earlier  in  (4) ,  assume  that  the  location  parameters  and 
the  logarithms  of  the  scale  parameters  are  locally  uniformly  distributed 
a^  priori  i.e; 
(U)     p^(y^)<?Ck^ 

(15)  P2(log  o^)o{k2   or  P2(o^K7;  ,  i  =  1,  2 

This  assumption  is  appropriate,  so  long  as  it  is  assvuned  that  any  point 
in  a  region  in  which  the  likelihood  for  y^,  y^,  log  0^   and  log  o^  was  ap- 
preciable would  have  been  as  acceptable  a  priori  as  any  other.   (Assumption 
used  in  Savage  Principle  of  Precise  Measurement) 

Suppose  g  is  a  measure  of  non  normality.   Choose  a.  priori  distribution 
for  8  with  modal  value  at  8=  0  and  containing  an  adjustable  parameter  which 
controls  the  degree  of  concentration  about  this  mode.  A  convenient  choice 
(Box  and  Tiao  (1962))  is 

(16)  p(e) Pj-zt  (1  -  B^)  ^  -  1  <  e   <  1 

[Fa]^  2^^  ^  a  >.  1 

l^en  a  =  1,  this  distribution  is  uniform.  This  parameter  "a"  can  be 
adjusted  to  allow  for  any  desired  strength  of  central  limit  effect.  The  case 
a  -  1  giving  a  uniform  distribution  for  p(6)  corresponds  to  no  central  limit 
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effect,  \fnen  'a'  tends  to  infinity,  p(3)  becomes  a  delta  function  and 
represents  an  over\i7helming  strong  central  effect.   This  corresponds  to  the 
assumption  of  exact  normality  for  the  parent  distribution. 

DERIVATION  OF  THE  POSTERIOR  DISTRIBUTION  OF  THE 

VARIANCE  RATIO  -7  FOR  FIXED  VALUES  OF  y. ,  y„ 
AND  S.         °1 

From  (7),  the  joint  likelihood  function  of  the  two  samples 

^l   "  ^^ll'^U'  •  •  •  •  ^In^^   ^"^  ^2  "^  ^^21'  ^22'  '  '  *  y2n2^  ^' 

~"l 
(17)    i(Oj^,02,lij^.Vi2,3lxj^,X2)  -  k  o^^    o^ 


-n,    -n. 


''     -  -  '-  .g  ) 


exp  |-  I  l^   n^s.  (3,y)  /o.     j 


n. 

1 


2/1+B 

r 

2/1+3 


where    8^(3,y)  - -i-  I     /y^.  -  w J      .  „  ^^ 


^  j=l 


and   k-ff^+-ii2j 


(l+iiS)-^    -1-2' 


Here  U-, .  y  are  assumed  to  be  known. 
1   2 

The  joint  posterior  distribution  of  a   ,  o„  and  3  is  then 
(18)     p(o^,a^,&\]^,M^,2.i»I.2^ 

-  v(^Wi»\i2*^V^2^   P(«'l.cr2'^'^l'^2'-^1'^2^ 

-  kp(a^)p(a2)p(S)  i^(Oj^,cr2'^'^l»^2'^l'-^2^ 
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The  conditional  posterior  distribution  of  o^  and  a^  for  given  value  of  B  is 
(19)   p(c>i^,a2iS,Uj^,Vi2»ii.X2)  "     ih     P(°il^'^i»^^ 


,       ,   ,    -(n+1)      ' 
where  p(Oj^  I  StUj^.i^)  =  \°i^  ®^Pi 


and  k.  =  n. 
1    1 


^n.s^(6,p) 


n^(l+e)/2 


-  J    n^s^(3,y)  /  0 


l2/l+B) 


1  + 


n^(l+3) 


which  seems  to  be  the  product  of  two  inverted  gamma  distributions.  The 


pos 


2 
teriori  distribution  of  -=■  is  obtained  by  making  the  transformation 
«2 


V  "■  -=■  ,  W  "  o  ,  and  integrating  out  W, 

9  1 


Thus, 


n. 


(20)  p(v|6,yj^,y2.I.i.X2) 


-  1 


kv 


nTS,(3,y) 
n2S  (B,y) 


1/1+6 


-Cn^+n^)  — 


n. 


l+8i 


where  k  =i 


/■_L.\   '•[("i->"2)"r] 


nj^s^CB.u) 
n2S2(3,y) 


Y-  (1+2) 


(Box  and  Tiao  (1963)) 

Sj^(B,y) 
Now  consider  the  quantity  ■^"TTTT  ^ 


1/1+3 


where  V  =  — ^  is  a  random 


variable  and  sAQ.m)    I   82(6, y)  is  a  constant  calculated  from  the  observations, 
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After  an  appropriate  transformation 

1/1+ S 


an  F-distribution  with  n^^d+S)  and  n^Cl+B)  d.f. 


""l 

.2 


In  particular,  when  S  =  0,  the  quantxty  V  ■j^ ^ 


2 


is  distributed  as  F  with  n^  and  n,  d.f. 


Further,  when  the  value  of  3  — >-  1  (the  parent  distributions  tend  to  the 

rectangular  form),  the  quantity 

2 


has  the  distribution, 


"l"2 
(22)     "H™  n^ul  8-u.  .u.-v.  .V.)  "  rr-, : r  u        for  u  <  1 


Itm  p(v|8,M^,y2.Xi.Z2)  -  2(n  +n  )  " 

e  — >  -1 


^-1 

2 

n  n 

-i-^- — 7   u        for  u  >  1 
2(n  +n  ) 


(Box  and  Tiao  (1963)) 

Thus,  for  given  S  not  close  to  -  1,  probability  levels  of  V  can  be  obtained 

from  the  F-table.  In  particular,  the  probability  a  posteriori  that  the 
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variance  ratio  V  exceeds  unity  is 


(23)  Pr[v   >l|3.yi,y2.Xi.X2]='  ^H^V^"*"^^'  ^2  ^  ^"^  ^^  ^  "iTsTiJ  J 

RELATIONSHIP  BETWEEN  TllE  POSTERIOR  DISTRIBUTION 
p(v|a,yj^,Vi2.Xi.X2)  AND  CLASSICAL  PROCEDURES 

From  (17),  it  can  be  shown  that  the  two  power  sums  nj^Sj^(e,y)  and 
n  s  (e,y)  when  regarded  as  functions  of  the  random  variables  ■^^   and  ^2 
have  their  joint  moment  generating  functions, 


-n^(l+6) 


where  Y  =  (nj^Sj^(S,y) ,  n^s^CS.y)) 


X  2/1+6  2/l+3^ 

,  letting  y'  =  /n^s^(B,y)/  o^  ,   n^s^CS.u)  /o^      I 


Thus 

one  obtains 


-n.(l+3)/2 
2  ^ 


(25)     My  (t^.t^)  =  iSi  (l-2t.) 

(Box  and  Tiao  (1963)) 

This  is  a  product  of  the  moment  generating  functions  of  the  independently 
distributed  X^  distribution  with  nj^(l+3)  and  n2(l+3)  degrees  of  freedom 
respectively.  Therefore,  Sj^(6,u)/s2(3,y)  on  the  hypothesis  that  a^a^   =  1 
is  distributed  as  F  with  nj^(l+3)  and  n2(l+3)  degrees  of  freedom  and  in  fact 
provides  a  uniformly  most  powerful  similar  test  for  this  hypothesis  against 
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1,   2 


the  alternative  that  oJo^   >1.  Tlie  significance  level  associated  with  the 


observed  Sj^($,y)  /  82(3, p)  is 


(26) 


1  s^(3,v)   ) 
Pr^^Fn^(l-.3),n2(l+3)j>^j(-^   V 


and  is  numerically  equal  to  the  probability  for  V  >  1  given  in  equation  (23). 

A  general  test  derived  by  Neyman  and  Pearson  and  later  modified  by 
Bartlett  (1937)  for  comparing  k  variances  for  normal  populations  using  the 
likelihood  ratio  method  is  given  as  follows  (This  result  is  due  to  Bartlett 


1937)  Let 


n. 

1 


(27)     A(0) 


k 
i«»l 


N  s^(0,y) 

\   n.s  (0,y) 
i-1  ^  ^ 


N  =    ^   n 
'  ^  i=l   "i  ' 


the  quantity  -  2  log  X(0)  /  g  (0)  is  distributed  approximately  as  x^  with  k 
degrees  of  freedom  where 


(28) 


;(S)  -  1  + 


[3k(l+3)]"[|,n;'  -N-^] 


In  general,  the  likelihood  ratio  X(3)  is  given  by 


n. 


(29)   X(3) 


^    rN  s^(3,y) 


-\   (1+3) 


1=1 


n  s  (3,y) 


ri 


The  quantity  -  2  log  X(3)/  g  (3)  is  approximately  distributed  as  x^  with  k 
degrees  of  freedom. 


THE  POSTERIOR  DISTRIBUTION  OF  V  raEN   3  IS 
REGARDED  AS  A  VARIABLE  PARAMETER 


The  joint  posterior  distribution  of  V  and  3can  be  written 
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(30)  p(V,B|p^,y2.I.i.i2^  =  P(3|yj^,y2'^l'^2^  P(V|  3,W^,y2.ii.X2^ 

where  ]?(V\S>,M^»V2»Zi*1.2>    ^^   Sivcn  by  equation  (20). 

The  marginal  distribution  of  3  can  be  written  as  the  product 

(31)  p(e|  y^,p2»Xi.Z2)  =  P(3)^(3|  V^,U2*Li»1.2> 
where  p(B)  is  given  by  equation  (16)  and 

-(n^+n^) 


rn^s^(3,y)j 


1+3 
-^12 


which  is  the  integrated  likelihood  for  6  .   Thus  p(s|  Pj_f y2'il»X2^ 
contains  information  of  two  kinds  i.e;  the  knowledge  a  priori  about  6  is 
characterized  by  p(8)  and  the  information  coming  from  the  sample  concerning 

3  is  represented  by  ^(3|m2_,V2»Zi»Z,2^       '  ■•. 

The  posterior  distribution  of  V  is  obtained  by  integrating  out  g  from 

equation  (30)  giving 

+1 

(32)  p(V|  M^,V2»I.i»L2>   =  /i   P(3|yi,y2'^l'^2^  p(Vi3,y;^,y2.V3^.Z2^  ^^ 

In  particular,  the  probability  a  posteriori  that  the  variance  ratio  V 
exceeds  imity  is 

(33)  Pr  ^V  >  1  \v^»V2*^V^2^  "  Pr  [^  V  >  1  1 3  ,yi.y2.Vj^.X2j  • 

P(3|yi.y2»^l':il2^  ^^ 
where  the  first  factor  in  the  integrand  is  given  in  (23). 
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POSTERIOR  DISTRIBUTION  OF  V  WHEN  \i^   AND  y^  ARE 
REGARDED  AS  VARIABLE  PARAI'lETERS' 

As  usual  suppose  \x^   and  p.  ^^^   locally  uniformly  distributed  ±  priori 
as  in  (14).  Upon  integrating  out  these  two  parameters  from  the  joint 
posterior  distribution  of  the  set  iM^tV'^*'^*^) »   one  can  write  the  a  posteriori 
distribution  of  3  and  V  as 
(34)     p(V,6lxi.X2)  -  P(v|B,Xi.X2)  '^('^\l.i'L2> 

The  conditional  ^  posteriori  distribution  of  V  for  fixed  value  of  3 
is  given  by  .  .     ' 


~2  "•^  ^^^  1+B       "-) — 2  ^■'■"*"^^ 

(35)  p(V|3,Xi.i2^  =  l^V  ^    f  nn2S2(3,y)  +  V   ''n^s^(3,y)  dp^dy^ 

where 


"i 
-  -j(l+3) 

k"^  =      (1+g) 


r 


-cn,+n,)cw)i  A'  [;ia-«]r[„^3,(6.w) 


Kn^-m^Mi+e;  1 


dy, 


and  s.(3,u)  ,   i  =  li  2  are  given  in  (17). 

When  the  parents  are  normal  (3  =  0),  the  quantity 


Kyn-yJ^/Cn.-i) 

F  =  V     '■'-'■ ^- 


'2i  ^2'  ' "2 


I(yo,-yo)  /(n,-i) 


has  an  F-distribution  with  (n--l)  and  (n„-l)  degrees  of  freedom.  \\?hen  the 
parents  tend  to  the  rectangular  form  (3  — >   -1)  the  quantity 
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„  ^   v/ —  \  .  where  h,  and  h.  are  respectively  the  ranges  of  the  first  and  the 

second  sample,  has  the  following  limiting  distribution 

n  -1  1 

(36)    lim  p(wl6,Xi,i2)  =  1^"    ^       [(^1^2^  "  ('^I'^a"^^  ^,  ^   }  for  w  ^  I 


B— >-  1 


'l'"2 


n  n^     (n^-1) (^2*1) 


for  w  >  1 


with  k  =  ^(n^+n^)    (n^+n^"^) (n^+n^"^) 

COMPUTATIONAL  PROCEDURES  FOR  TlIE  POSTERIOR 
DISTRIBUTION  ?(^\&»Zi»Z2^ 

To  avoid  complexities  in  evaluating  the  double  integral  in  (35),  the 
following  procedure  is  adopted.  The  general  expression  for  the  moments  of  V 
is  obtained  in  the  form 

(37)     E(v''l3,ij_,X2)  '     ^   • 


p^        -  i  (n.+2r)  (1+3)         pr-        j  J   (n  -2r)  (1+6) 
_i  [n^s^(6,y)J  ^^ dy^     i  K^2^^'^^J ^ 


"^1  ,,.o^  r"r-        l-^  (1+S) 


r[vx<3..)r  ""^^      "tva<-">] 


^ 


with  k  =  r  r-|-  (nj_+2r)(l+8)'j  .  rjj  (n2-2r)  (I+3)]  /.i^  F  [^  -^^"2 / 
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This  expression  involves  only  a  simple  evaluation  of  one  dimensional   '■•• 
integrals.  '  '  ;•. 

The  more  simpler  method  can  be  employed  as 

In  the  integrand,  the  moments  of  the  conditional  a  posteriori  distribution 
of  V  for  fixed  choices  of  y,  and  y„  are  given  by 


r(l+3) 


(38)    E(v''|B,yj:.y2.Zi.i2^  = 

r  [7("i+2r)(l+S)]  .r[|(n2-2r)(l+B)] 
ill  '[^   <^-^^>] 
and  the  joint  a  posteriori  distribution  of  y^  and  y-  ^^ 


--  n  (1+B)/  f«        -1 


rjn.d+B) 


(39)  p(y^,y2|0,Zi,Z2)  =  ill  f  Vi^^'^^1     ^    /   iln,s,  (B,y)  i        dy 
As  shown  in  the  paper  by  Box  and  Tiao  (1962)  that  for  fixed  3,  the  function 


n 


(40)   f(y)  =  ns(6,y)  =  I 

i=l 


.2/1+3 


Vi   -^1 


has  continuous  first  derivative  and  a  unique  minimum  at  some  point  in  the. 

3 
interval  [y„,y,  ].  \fnen  3  =  0  or  -■.   -3  <  -1/3,  it  can  be  shown  that  f  (y) 

exists  and  is  continuous,  Thus,  for  these  values  of  3,  one  can  employ 

Taylor's  theorem  to  expand  f(y)  into  f(y)/~^f(y)  +  1/2  f  (y)  (y-y) 

where  y  is  the  point  at  which  f (y)  attains  a  minimum.  This  approximation 
will  be  satisfactory  when  3  is  not  close  to  -  1.  From  this  result,  one 
finds  that  the  moments  of  V  in  equation  (37)  is  approximately 
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(41)  E(v''ie,^^,^2> 


.  r(l+B) 

J)l 


r(n,+2r)(l+B)  -11  fCn  -2r)  (1+3)  -1  1   ^  n  s  (3,y) 
This  implies  that,  to  this  degree  of  approximation,  the  moments  of 


/^ 


TTa         n.^s^(e,;)/fn^(i+3)  -i] 

(^2)     C(V)  =  V   ^    :  n^s^i.J)  /  [n^a^^)   -I] 

are  the  same  as  those  of  an  F  variable  with  n,(l+3)  -1  and  n2(l+6)  -  1 
degrees  of  freedom,  and  hence  that  the  a  posteriori  distribution  of  C(V) 
can  be  closely  approximated  using  ordinary  F-tables.   In  this  approximation, 
the  nuisance  parameters  y  and  y  in  the  _a  posteriori  distribution  of  V  are 
eliminated  by  the  very  simple  process  of  replacing  them  by  their  maximum 
likelihood  values  and  reducing  the  degrees  of  freedom  by  one  unit. 

The  justification  supplied  above  for  this  simple  approximation  is, 
unfortunately,  only  valid  when  3=  0  and  3<  1/3  but  not  close  to  -  1.  But 
in  actual  practice,  the  approximation  has  a  much  wider  usefulness. 

BAYE3IM  ANALYSIS  OF  THE  REGRESSION  MODEL  ,:''' 

(i)     The  regression  model  with  the  coefficient  vector  3=  (g  ,3  ,  .  .  .  ,3  ) 
can  be  written  as 
(43)   y  =  X3+  e 
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where  y  is  a  Txl  vector  of  observations,  X  is  a  I'xp  matrix  of  fixed  elements 
with  rank  p,  and  e  is  a  Txl  vecotr  of  random  disturbances.  Assume  that  the 
c's  are  NID(0,a^).  Under  these  assumptions  the  likelihood  function  is 

-'■■  i  ■- 
rp 

(44)  /(e.a|y)  =/-^^    exp  f  ^^  (y-XB)  (y-X3) ] 

I^o/Ttt  /        L  2a2  J 

Denote  the  quadratic  form  in  variables  3  centered  at  r\     and  with  matrix 
A  by   Q(3,n,A)  =  (g-n)   A(e-ri) 

Then  the  likelihood  function  can  be  re-written  as 

T 

(45)  /(3,a|y)  =/-:r-)    ^""^  [^     j  vs^  +  Q(3,B,Z)  j  J 

where  Z  =  X'X,   3=  z'^^^X  Y,  v  =  T-p  and  s^  =  ^  (y-X3)   (y-X3) 

Using  Bayes'  theorem,  the  joint  posterior  distribution  can  be  written  as 

(46)  p(3,a|y)  =  kp(3,o)  ^(3.a|y) 

where  k"=  {  p(3,a)  /(3,a|y)  d3da 

and  p(3,o)  is  the  prior  distribution  of  the  parameters  3  snd  a 

Vfnen  there  is  nothing  known  about  3  and  a   ,   then  the  _a  priori 
distribution  of  3  and  log  a   could  be  talcen  as  locally  uniform  and  independent, 
i.e. 

(47)  p(3)o<  kj^;  p(log  a)oC  k^  orp(o)<i 

Combining  (45)  and  (47)  in  (46),  the  joint  posterior  distribution  of 
3  and  o  is 
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(48)     pCe.oly)  -  const  o"^^"^^^   expi^  ^vs^  +  Q(  0, 8,Z)]  V 

The  marginal  posterior  distribution  of  0  is  obtained  by  integrating  the 
joint  posterior  density  function  over  o  which  gives 


(49)     p(6|y)  =  const  \l  +  ^^^-^^f^']  (Savage  (1962)) 

L     vs 

This  distribution  is  in  the  form  of  a  multivariate  t-distribution. 

In  particular,  the  marginal  distribution  of  a  single  element  3  can 
be  expressed  in  terms  of  a  univariate  t-distribution  with  T-p  degrees  of 
freedom. 

(ii)     According  to  Raiffa  and  Schlaifer,  consider  03^/02  "^  ^   where  k 
is  known.  Suppose  k  =  1,  so  that  0^  =   a^     =  a  .  For  this  case,  they  show 
that 


-(T^+T^+l)      f    1     V         2     ^  2 

(50)   p(B,aIyj^,y2)  =  const  a  exp  ^-  — ^   ^v^^s^  "*"  ^2 

+  Q(0,B,Z)J  ( 

where  Z  =  Z^+  Z^,  g  =  Z  (Zj^S]_  +  ^232)  ^'^^  ^^^   quantities 

(v  .s  .3  .Z.)   i  =  1,  2  are  defined  in  connection  with  (45). 
^  i*  i  *  1*  1 

On  integrating  out  a,   the  posterior  distribution  of  6  is 


(51)     P(3lyi.y2)  -  const  [l  +  ^^^^ 

L      vs 


Si 

2 


J 


-  I  (v+p) 
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with  V  =  T^+T^-P  and  s^  =  -^  (^1^1^+  ^2^2^^  ^^^^^   ""^  ^"  ^^^  ^^^   ^°™  ^^  ^^^^ 
(iii)     Theil  (1963)  considered  the  case  x^hen  o^^  and  o^  are  different 
and  a     is  knovm.  He  proceeded  within  the  sampling  theory  framework  to 
construct  the  following  estimator  for  8  which  incorporates  inforraation  from 
both  samples,  / 


-f-^  VT  ^2)  f-h   vi^i  "2^ 


(52)   s  =/7r-  VT  h  ~      Vi^~  ^2^2 

I  1     "2    M  ''i        '2 

The  _a  posteriori  distribution  of  g  is  given  by 


Vo+? 


(53)   p(B|y  ,y  )  =  const  exp  ) 5-   Q(B,B  ,Z  ) 

12  1  20^ 

1 


1  + 


2 
Q(S,32,Z2^" 


2 
^2^2 


where  a^  is  known  and  6  and  log  a^  are  locally  uniform  a.  priori.  The 
expression  (53)  is  the  product  of  two  factors,  the  first  is  a  multivariate 
normal  form  and  the  other  a  multivariate  t-foim. 

(iv)     Suppose  a     and  o  are  independent  and  unknown.   In  such  cases, 
with  locally  uniform  a  nriori  distribution  for  S,  log  a^  and  log  02 » 
one  finds  that  the  a_  posteriori  distribution  of  3  based  upon  two  samples 
is  given  by 
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v^+P 


-1  I     ^^(g'gj'^p 

(54)  p(B|yi.y2)  =  '^   p-  ^ 


Vi 


V2+P 


Q(3.3o.Zo) 


1  + 


2'  2' 


^2^2 


with  k 


/ 

R 


Vi+P 


1  + 


Q(3,6^,2^) 


^1^ 


[ 


1  + 


0(B, 62.22^ 


^2^2 


V2+P 
n  2" 


d3 


This  distribution  is  the  product  of  tvro  nultivariate  't'  distributions 
v/hich  is  known  as  nultivariate  "double-t"  distribution.  The  normalizing 
factor  k,  here  is  a  p  dimensional  integral. 

The  result  obtained  in  (54)  is  applicable  to  the  problem  of  making 
inferences  about  a  population  mean  when  samples  are  drawn  from  two  normal 
populations  with  common  mean  and  unequal  variances. 
In  this  case  expression  (54)  reduces  to  .  ..•■^■■'  •., 


v^+1 


v^+l 


(55)  p(3|y^,y2)  =  k 


-1 


1  + 


(v^+i)(e-yp^ 


Vi 


-      2 


1  + 


(v2+l)(3-y2)^ 


"2^2 


V  +1 


where  k  = 


1  + 


(v^+l)(3-  yp^ 


Vi 


v^+l 


1  + 


(v2+l)(B-y2)^ 


^2^2 


d6 


-     -         2  2 

and  the  quantities  y^.y-.s^   and  s^  are  respectively,  the  sample  means  and 
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the  saiT.ple  variances  for  the  two  sets  of  experiments. 

Generalizing  the  above  discussion,  suppose  that  the  likelihood  function 
for  the  ith  sanple  is  in  the  fonn  of  (44)  with  paraneters  (0,a.)  ai.d  data 

,       X  T  )  .   i  =  1,  2 k.  Then  by  taking  the  o^'s  as  independent 

^i'  i'  i  * 

scale  parair.eters,  one  obtains  the  following  posterior  distribution  of 


v,+p 
1 


(56)    p(ely)  =  w  ^2j^  \  1  + 


Q(B,3..Z.) 
vTs? 

L  L 


V.+O 
1  * 


where  w 


R  i=l 


Q(g.B..Z.) 

2 

v.s . 

1  1 


dg   (Tiao  and  Zellner  (1963)) 


This  distribution  is  the  product  of  k  factors  each  of  which  can  be 
expressed  as  a  multivariate  't'  distribution. 

ASYMPTOTIC  EXPRESSION  FOR  THE  LOILTIVARIArE 
"DOUBLE- t"  POSTERIOR  DISTRIBUTION. 


In  the  previous  section,  the  problera  of  p  dimensional  integration 
is  rather  laborious  and  difficult.  This  can  be  simplified,  by  expanding  the 
a  posteriori  distribution  into  an  asymptotic  series  in  powers  of  v^   and 
V  "•^,  and  one  can  reduce  the  problem  of  integration  to  a  problem  of  evaluating 
the  mixed  moments  of ' two  quadratic  forms. 

Let  V^   =  4  h'  'h   =  h    h'  ^^   Q(^'^i''-^l>  ^^  ^2  "   ^(^'^2'^^^ 
^1  ^2 

The  expression  (54)  then  becomes 
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v^+p 


"^2^ 


1      2 


(57)         P(e|yi,y2)  = 


-1 


1  + 


■1' 


1  + 


•2 


Vj^+P 


Vo+P 


with  k 


/ 
R 


1  +  -^ 


1  + 


"•2' 


dB 


v^+P 


The  expression 


.^^^J 


can  be  written  as 


v^+p 


•A 


"   exp 


4^1 


exp 


v.+p 


i  "1-4-    i°sa*^) 


Expand  the  second  factor  on  the  right  in  powers  of  v^ 


-1 


v^+p 


(58) 


^1 
1  +  -^ 


ext) 


fOx 


i=0 


Vl 


-1 


where  p  =  1 
o 


Pi  =  4 


P2  =  96 


Qj_  -  2p  Q^ 


3Q^^  -4  (3p+4)  Q^^  +   12p(p+2)  Q^^ 


Similarly, 
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V2-rp 


(59) 


^2 


exp 


1  ^ 
"2  ^2 


J  i=0 


^i  ^2 


-1 


where 


=  1 


<li  = 


lo  = 


96 


<2  -   2P  ^2 


3q^^  -  4(3p+4)  Q,^  +  12p(p+2)  Q 


Substitute  (58)  and  (59)  into  (57)  and  after  simplifying  obtain 


1/2 


(60)  p(e|y;,.y2)  =^  -jf^  ^^P 


7  Q(3,3.  D) 


CO         00 


-1, 


where  D  =  Kj_  +  M^,  B  =  ^  (M^g^+M232)  and 


exp 


00        00 


-■|Q(6.  B,D) 


dg 


(2Tr)' 
(Tiao  &  Zellner  (1963)) 

This  integral  can  be  integrated  term  by  term.  It  appears  from  the 
expression  (58)  and  (59),  that  each  term  is,  in  fact,  a  bivariate 
polynomial  in  the  mixed  moments  of  the  quadratic  forms  Q  =  Q(6,6j^,Mj^)  and 


Q  o  Q(e,e  M  )  where  the  variables  6  have  a  multivariate  normal 
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distribution  with  raean  8  and  covariance  matrix  D 

Another  simpler  method  for  obtaining  mixed  moments  is  first  to  find  the 

mixed  ciimulants. 

The  cumulating  generating  function  of  Qj^  and  Q^  is 


1/9 

(62)       UCt^.t^)   =log     /    -^^^    exp    [t,Q^+t2Q2-|Q(3.B.D)j       dg 


=  -  Y  log 


1  1  '  ' 

I  -  2D      (t^^+t^Vi^)      +  t^T)^  Mj^nj^+t2n2  M2n2 


+  2(t^M^n^+t2l-l2n2)'    (D-2t^K^-2t2H2)"^t^M^n^+t2M2n2) 
where  n^   =  3  -  3,     and  n^  =  3     -32 

Upon  differentiating  (62),  one  can  find  the  various  cumulants.  The 
general  form  of  which  is  given  below 


(63)    k   =  r+s-1  (r+s-2) ! 


rs    2 


-1  .rs 


(r+s-1)  tr.  D    G   +  (rn^+sri2)' 


g"  (rn^+sn2)  -  rn/  g"  n^-sn2'  g"  r,2j    r+s  >  2 

where   g"  =  D(d'Hi^''  (d'Hi^)^      (Tiao  and  Zellner  (1963)) 

Employing  the  bivariate  moment-cumulant  formula  as  given  by  Cook  (1951), 
the  integral  in  (61)  can  be  written  as 
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CO       03 


(64)  W  =   I    I   b   v^ 

i=o  j=o   -J 


"'  v.-J 


vrhere    b^^  =  1 


10 


i[ 


^20  ""  ''lO   '  ^P^IO 


'01   4 


f 02  +  ^01 


2pk 


01 


Substituting  the  results  in  (64)  into  (60),  one  obtains 


,1/2 


(65)  P(3lyi.y2)  "      '  '  p/2   ^""P 


(2tt)' 


-i   -J 


_  ^  Ji=0  j=0   -^ 


wnere 


^00  =  ^ 

^10  =  Pi  "  ^10 


^01  =  ^1  "  ^01 


'^u  ''   ^Pi  ■  ^o^^^r^oi^  ■*"^o  ^Ol'^l 

^^20=  P2-^20"'Pl^O  +  ^ 


10 


02 


^12  -  ^02  -  ^1  ^01  ■"  ^ 


01 


The  posterior  distribution  is  expressed  in  the  form  of  a  multivariate  normal 


-1    A   „  "1 

distribution  multiplied  by  a  power  series  in  v^    and  v^ 


vnien  V  and    — ^>  "  ,  all  terms  of  the  power  series  except  the  leading 


one  vani 


sh  so  that,  in  the  limit,  the  posterior  distribution  is  multivariate 


-1 


normal  with  mean  6  and  covariance  matrix  D   .  For  finite  values  of  m^ 


and  V-,  the  terms  in  the  power  series  can  be  regarded  as  "corrections  in 
a  normal  approximation  to  the  multivariate  "double-t"  distribution. 
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THE  MARGINAL  POSTERIOR  DISTRIBUTION 


vnien  interest  centers  on  a  subset  of  the  elements  of  B,  say  g^^^  = 
(3^  .  .  .  ,B^),  an  asymptotic  expression  for  the  corresponding  marginal 
posterior  distribution  can  be  obtained  by  integrating  out  the  remaining 

elements, 

g    =  (g^"*"^ g  ^)  from  the  joint  distribution  in  (65). 

(m) 

l.|l/2 
(66)     One  obtains  P(S/^  17^^.72)  ° 


(2tt) 


p/2 


R 


exp 


-  Y     Q   (g,    g   ,   D) 


00  00 


-1 .  -j 


y     yd..  V,  V  J  dg 
J  iio  j=o  ^'   ^    2 


ij      i        /  (m) 


Denoting  g  =  (g.^v  I    g,  v)  and  partitioning  the  matrices  D  and  D 


-1 


into 


i 


't  1 


u  I 


Lm 


D  .     '     D 
mv[     I      mm 


-1 


p-^ 


V^ 


V. 


|m 


ml      I 


mm 


One  can  write  the  marginal  posterior  distribution  as 

1/2 


(67)     p(e(^)lyi»y2)  = 


\l 


-1 


(2TI) 


i/2 


exp 


-2  Q<2(0»^(0'''^^'^^ 


f(3(£)|yi.y2) 


where  ^ (2(£) lyi»y2) 


ram 


(2Tr) 


1/2 

/. 

v-i)/2       R 


exp 


-  T  Q  (3,  v,e,3  ) 
2  ^   (m)  '  *  mm 


OS       oo 


iio  j=o  ^J  ^ 


"^2"'  '^^m) 


-1 


with  6=  g.  .  -  D      D  ,  (g,.v  -  g,/,. 
(m)    urn    m£.   (£)    (^) 
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From  the  expression  for  d^ .  given  in  (65),  each  tern  in  the  integral 
f(3  \  ly  »yo)  is  a  bivariate  polynomial  in  the  quadratic  forms 

Q(e,e  ,M  )  and  0(3,62,^2)  "^^^^^^  ^(C)^^   considered  fixed  and  Q^^^ 

has  a  nultivariate  normal  distribution  with  mean  6  and  covariance  matrix 

D  ~^  .     Adopting  the  sane  procedure  as  done  in  the  previous  section 
mm 

^1=  ^hiO    :    ^l(m))'  V  ^k(l)    :  ^2(n0^ 


M 


2 

I 

1 

.  P  -^ 

hi 

1 

1 

\. 

~  t  ~ 

\i 

1 

• 

B 
mm 

. 

1 
• 

, 

M. 


-1 


i         p  -£ 


%  I   ^^n 


I 

E    '   E 
mt  I   mni 

I 
I 


M„ 


Cgg  I 


m2 


P  -l 

Sm 


mm 


K, 


-1 


'2 


-e     p  -^ 


€m 


mi  mm 


The  general  form  for  the  nixed  cumulants  of  Q(3,3j^,Kj^)  and  Q(3,32»^'2^ 


is  given  below 

r+s-1 
(68)  W  =  2^^  ^ 
^  rs 


(r+s-2) ! 


(r+s-1)  tr       D     "■'•       h"  +   (rY.+SY,)        h" 


'      rs  rs 

(rYj^+SY2)    -   TYj^     E  Yi    -   SY,      H     y. 


V7here     H       =  D 


1        "'2 

r 


mm      1  mm            mm  J  mm                tm 

"^1  =  e-  3w   V   +  B       "•'•       B    „  [^fn.  -  3w;xl 

l(m)          mn               mfi  [    (£)          1(<^)J 

Y«      -^     :^  .   „      -1 


r  +  s   >_  2 
s 


2(m)  mm  mg    [_  (^)  2(c) J 


B 

E 

c 
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Using  the  result  in  (68),  the  marginal  posterior  distribution  of  3(0 
can  be  expressed  as 


\m\ 


-1 


(69)  pCs^^^jy^.y^  =-^—71  ^''P 


where  6^^  =  1 


"^01  =  ^01  "  \l' 


20 


S20"^20''Si0^0'"^10 


1        .       -1 
"2  ^^^(i)»^£)'^"   ) 

^10=  %0  "  ^10 


.  i=0  j=0' 


hi"  2ir^r8io^orsoi^io"'^'''oi'''io 


'02 


S02"^02"S0l''^0l''"%l 


Tlie  quantities  g.  .  are  functions  of  the  mixed  cumulants  W.  .  with  functional 
relationships  exactly  the  same  as  those  betv/een  b..  and  k. .  shown  in  (63). 

It  should  be  noted  that  when  3,^^  consist  of  one  variable,  the  quantity  6.. 

in  (67)  are  simply  polynomials  in  that  variable.  Employing  the  well  known 
expression  for  the  moments  of  a  normal  variable  one  can  easily  derive  an 
asymptotic  expression  for  moments. 

This  finishes  our  discussion  on  application  of  Bayes'  Theorem  in 
Regression  Analysis.  For  an  illustrative  example,  the  reader  may  refer  to 
the  paper  on'Bayes'  Theorem  and  the  use  of  Prior  Knov/ledge  in  Regression 
Analysis'  by  George  C.  Tiao  and  Arnold  Zellner,  University  of  Wisconsin. 

SUl'iiARY 


The  use  of  Bayes'  Theorem  in  Statistical  inferences  has  recently  been 
reconsidered  in  the  works  of  Jeffreys,  Savage,  Box  and  Tiao  and  others.  One 
advantage  of  a  Bayesian  approach  is  that  prior  knowledge  about  parameters  of 
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interest  can  be  combined  in  a  well-defined  mathematical  way  with  information 
obtained  from  experiments.  Such  prior  knowledge  may  cone  either  from  some 
general  theoretical  considerations  or  from  the  results  of  previous  experience. 

Through  the  use  of  Bayes'  theorem  one  can  obtain  the  posterior  distri- 
bution of  a  certain  parameter  on  the  basis  of  a  likelihood  function  and  the 
prior  distribution  of  that  parameter.   It  has  been  shown  that  the  best 
Bayesian  point  estimate  is  the  mean  of  the  posterior  distribution.  The 
Bayesian  solution  of  a  design  problem  requires  that  one  looks  beyond  the 
prior  distribution  to  the  ultimate  decisions  that  will  be  made  in  the  light 
of  this  distribution. 

After  assuming  the  form  of  our  parent  distribution,  which  is  not  neces- 
sarily considered  to  be  normal,  but  only  a  member  of  a  class  of  symmetric 
distributions  which  includes  normal,  one  can  derive  a  criterion  which  is 
appropriate  on  this  assumption.  For  example,  on  the  assumption  of  normality, 
for  the  comparison  of  tv/o  means  one  would  derive  the  t-statistic.   It  seems 
natural  to  justify  the  use  of  such  a  normal  theory  criterion  in  the  practical 
circumstances  in  v/hich  normality  cannot  be  guaranteed.  These  situations  lead 
one  to  adopt  Bayes'  method  for  solution  of  such  problems  x^here  normality 
Ceinnot  be  assumed.   ^  /  .   •' .   , 
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The  Rayesian  approach  emphasizes  the  fact  that  p.iven  the  likelihood 
function  and  the  prior  distribution  of  a  certain  parameter  one  can  find 
the  posterior  distribution  of  that  parameter  by  using  the  Bayes'  theorem. 
One  of  the  advantages  is  that  the  condition  of  normality  need  not  be  assumed 
V7hile  deriving  certain  distributions.   By  use  of  Bayes'  theorem,  it  has  been 
shovm  how  prior  knov/ledge  can  be  utilized  in  conjunction  with  saraple  in- 
formation in  maliing  inferences  about  the  parameters  of  the  regression  model, 

Tlie  't'  distribution  v/as  derived  given  the  likelihood  function  and 
prior  distributions  for  parameters  p  and  a     •   It  was  also  shown  that  in 
sampling  from  a  parent  distribution  which  is  a  member  of  a  class  of  sym- 
metric distributions,  one  can  find  the  posterior  distribution  of  the  mean 
y  by  integrating  out  o  from  p(y,a|y,3  ),  for  any  fixed  g  . 

A  tv70-sanple  problem  was  considered  v/here  the  two  samples  are  dravm 

from  specified  populations  with  location  parameters  y,  and  y_  and  scale  par- 

aneters  o,  and  a.  and  a  common  non-normality  parameter  S.  Assume  that  y^ 
12  „,  1 

and  V«  are  known.  Let  the  ratio  V  =  — 7  and  3  correspond  to  the  nuisance 

parameter  in  our  general  formulation.  One  can  then  study  p(  —^   I  3-,^  ,  the 

conditional  posterior  distribution  of  the  squared  scale  parameter  ratio,  for 
any  chosen  degree  of  non-normality  together  with  the  associated  p(3|j^  vmich 
indicates  the  plausibility  of  that  value.  Tne  posterior  density  p(3|;,^  can 
be  written  as  the  product  £i^\^   p(3)  whose  elements  are  associated  with  (i) 
the  infoisiation  concerning  non-normality  coming  from  the  data  and  (ii)  that 
injected  a  priori. 

Further,  if  one  removes  the  assumption  that  y^  and  V     are  kno\-7n,  then 
the  problem  involves  two  laborious  integrations.  But  a  close  approximation 


to  the  integrand  can  be  obtained  by  replacing  the  unknovm  p^  and  y„  by  their 
maximun  likelihood  cstinators  in  the  integrand  and  chanj7;ing  the  degrees  of 
freedora  by  one  tinit. 

In  the  case  of  a  regression  model,  attention  has  been  directed  at  devel- 
oping procedures  for  using  information  from  one  sample  as  prior  knowledge  in 
the  analysis  of  a  subsequent  sample.   It  is  assumed  that  the  two  sai;iples 
drawn  fron  the  population  have  unequal  variances.  Given  a  regression  model 
with  specified  coefficient  vector  S,  one  can  write  the  likelihood  function 
x^hich  can  again  be  utilized  for  use  of  Bayes'  theorem  in  the  development  of 
a  posteriori  distribution  of  3.   Suppose  that  the  likelihood  function  for 
the  ith  sample  is  in  the  forra 


T 

£(e,cr|y)  =1 \    exp 


^^  (y-X0)'   (y-X3) 
2a2 


with 


parameters  (3,0.)  and  data  (y.,X.  ,T.),  i  =  1 k.  By  taking 


0, 's  as  a  independent  scale  parameters,  one  can  find  the  posterior  di§tri- 

V.  • 

bution  of  3  as  a  product  of  k  factors  which  can  be  expressed  as  a  multivar- 
iate 't'  distribution.  '■••■' 

In  the  above  case,  the  problem  of  p  dimensional  integration  is  labo- 
rious and  difficult.  This  can  be  remedied  by  expanding  the  posterior  dis- 
tribution into  an  asymptotic  series,  and  thus  reducing  the  problem  of  in- 
tegration into  a  problem  of  evaluating  the  mixed  moments  of  two  quadratic 
forms. 


